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1.  Introduction 

The  Weibull  distribution  is  very  versatile  and  has  found  many  uses  in 
reliability,  and  in  the  climatological  modeling  of  weather  elements.  In 
this  study,  we  compare  several  alternatives  to  maximum  likelihood  (ML) 
estimation.  Somerville  and  Bean  (1982)  compared  ML  and  least  squares  (LS) 
and  found  that  under  ideal  conditions  LS  and  ML  gave  substantially  the  same 
results.  However,  when  contamination  or  censoring  occurs,  or  when  the 
wrong  model  is  used,  LS  can  give  substantially  better  results.  In  addi¬ 
tion,  ML  estimation  of  the  parameters  of  the  Weibull  distribution  requires 
iterative  techniques.  The  alternatives  in  this  study  are  considered  for 
two  reasons.  They  are  more  robust,  and  most  of  them  are  much  easier  to 
compute. 

It  is  common  to  evaluate  estimators  on  the  basis  of  their  variances 
and  biases.  Although  these  are  important  considerations,  a  user  is  fre¬ 
quently  more  interested  in  how  well  the  model  is  going  to  predict  probabil¬ 
ities.  That  Is,  we  are  more  interested  in  the  fit  of  the  cumulative 
distribution  than  the  values  of  the  parameters.  We  will  evaluate  the 
alternatives  on  the  basis  of  the  fit  to  the  cumulative  distribution.  We 
use  the  following  form  of  the  cumulative  distribution  function 
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F(x)  *  l-exp(-ax6). 


x,a,8  >  0. 


(1) 
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2.  Alternatives  to  Maximum  Likelihood 

Let  Xj,  *2*  •*-»  xn  be  the  orc*ered  observations  of  a  random  sample 
from  the  Weibull  distribution.  We  use  the  following  form  for  the  empirical 
cumulative  distribution  function  (CDF) 

Fn(x.)  =  (i-.5)/n  (2) 

and 

F„(x)  ■  Fn(x,)  x,  s  x  <  x1tl.  (3) 

The  estimators  which  follow  (except  for  the  method  of  moments)  select  (a,B) 
so  as  to  minimize  the  "distance"  between  F(x;a,e)  and  Fn(x). 

2.1  Non-Linear  Regression  (NLR) 

In  non-linear  regression,  (a,e)  are  selected  so  that  the  expression 

"  (F(x.;a,B)  -  F(x  ))2  (4) 

i=l  1  "  1 

is  minimized.  This  ensures  that  the  model  distribution  fits  the  empirical 
CDF  in  the  least  squares  sense.  However,  costly  iterative  techniques  are 
required. 

2.2  The  Log-Linearization  Method  (LLM) 

The  non-linear  model  in  2.1  may  be  made  linear  by  using  logarithms. 
Let 

Of  •  1  -  Fn(xt> 


(5) 
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qi  «  exp(-axj)  (6) 

where  a  and  6  are  estimates  of  a  and  B.  Then 

1n(-ln  q^)  =  In  a  +  6  In  x^.  (7) 

We  may  regard  this  as  a  regression  model  with  ln(-ln  q.)  as  the  dependent 
variable  and  In  x  as  the  independent  variable.  Reversing  the  roles  of 
dependent  and  independent  variables,  we  may  also  write 

A  A  A  A 

In  xi  =  ln(-ln  q.)/B  -  (In  a)/B.  (8) 

Ordinary  least  squares  may  be  used  to  obtain  model  coefficients  from  which 
estimates  of  (a,B)  may  be  obtained.  Using  (7),  the  model  attempts  to  fit 
the  CDF  while  in  (8)  the  model  attempts  to  fit  the  percentiles.  However, 
using  equations  (5)  and  (6),  the  non-linear  least  squares  equation  (4)  can 
be  written  as 

(9) 

and  it  is  this  sum  (9)  that  non-linear  least  squares  seeks  to  minimize. 
Using  the  log-linearization  method  coupled  with  ordinary  least  squares  one 
is  seeking  to  minimize 
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Z  (In  (-In  q.)  -  ln(ln  q. 
i=l  1  1 


Since  the  sum  of  squares  being  minimized  is  different,  the  resulting 
estimates  of  a  and  e  derived  from  log-linearization  can  be  very  different 
from  the  estimates  derived  using  non-linear  regression. 


2.3  Weighted  Least  Squares  (WLS) 

If  we  can  weight  the  regression  using  the  log-linearization  method  in 
such  a  way  that  the  weighted  distance  metric  is  the  same  as  for  the  non¬ 
linear  regression  method,  then  we  can  achieve  the  results  of  the  non-linear 
regression  without  using  the  costly  iterative  technique.  That  is,  by 
putting  u  =  ln(-ln  q^),  v  =  In  x^,  a  =  In  a  and  b  =  &,  (7)  becomes  u  =  a  + 
bv  +  e  and  we  solve  for  values  of  a  and  b  which  minimize  Ew^e^. 

Also,  a  weighted  log-linear  result  using  (7)  and  using  weights  w^  may 
be  obtained  by  minimizing 


Z  w^(ln(-ln  q.)  -  ln(ln  q.))2 
i=l  1  1  1 


Equating  expressions  (9)  and  (11),  and  solving  for  w^  we  have 


l/wi  =  (ln(-ln  q.)-ln(-ln  qi ) )/(qi~q1 ) 


-  qi.  J/wi  +  (ln(-ln  q1 ) ) ’ 
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where  the  prime  Indicates  the  derivative  with  respect  to  q^,  and  we  have 
w1  *  -qi  In  qr 

Thus,  using  (11)  with  w^  =  -q^  In  q^,  we  have  a  weighted  least  squares 
method  (using  ordinary  least  squares)  which  is  approximately  equivalent  to 
using  non-linear  least  squares.  (Similar  results  are  obtained  when  the 
roles  of  the  independent  and  dependent  variables  are  reversed  and  (8)  is 
used.  In  that  case  it  is  easy  to  show  that  w^  =  x^.) 

2.4  Generalized  Least  Squares  (GLS) 

Engeman  and  Keefe  (1982)  describe  a  method  which  takes  into  account 
the  fact  that  the  observations  must  be  ordered  to  calculate  the  empirical 
CDF  or  percentiles.  Putting 

yi  =  In  x.,  zi  =  ln(-ln  q^,  a  =  -(In  a)/B,b  =  1/e  (14) 

(8)  may  be  written  as 

^  =  a  +  b  z..  (15) 

However,  the  variance-covariance  matrix  of  Y'  =  (y^,  y^,  ....  yn)  is  not 
?  ? 

o  I  but  o' v.  An  approximation  to  V  is  given  by 

=  (l-qi)/(qi  In  qi  In  q^)  for  i  s  j.  (16) 

The  resulting  estimates  from  Engeman  and  Keefe  are  obtained  from  the 
equation 


idkOWi 
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(a,b) '  =  (Z,V"1Z)“1(Z‘V”lY) , 


where  Z  consists  of  a  column  of  ones  and  a  column  of  the  values  ln(-ln  q^). 
Knowing  estimates  of  a  and  b,  we  can  find  estimates  of  a  and  6  since 


a  =  exp(-a/b) 
8  =  1/b. 


2.5  Methods  of  Moments  (MM) 

Menon  (1963)  used  the  method  of  moments  to  obtain 


b  -  (6sV)1/2 

a  =  x'  +  /5772  b 


where  s^  and  x'  are  the  sample  variance  and  mean,  respectively,  of  In  x^. 
We  then  calculate 


a  =  exp(-a/b) 
8  =  1/b. 


3.  Simulation  Results 

The  five  estimation  methods  described  in  the  previous  section,  the 
log-linear  (ll),  weighted  least  squares  (WLS),  non-linear  regression  (NLR), 
generalized  least  squares  (GLS),  and  Menon's  methods  of  moments  (MM)  were 
used  on  a  number  of  generated  data  sets.  All  of  the  methods  except  MM 
could  be  approached  from  two  directions.  The  In  x  could  be  considered  the 
dependent  variable  or  ln(-ln  q)  could  be  considered  as  the  dependent 


-  7  - 


variable.  The  LL,  WLS,  and  NLR  generally  gave  better  results  when 
1 n( -1 n  q)  was  used  as  the  dependent  variable.  GLS  gave  much  better  results 
when  In  x  was  used  as  the  dependent  variable.  Therefore,  we  present 
results  for  ln(-ln  q)  as  the  dependent  variable  when  LL,  WLS,  or  NLR  was 
used,  and  In  x  when  GLS  was  used. 

Samples  of  size  n  =  25  were  generated  from  each  of  the  distributions 
in  the  study.  For  each  distribution  N  =  30  replications  were  used. 
Several  Weibull  distributions  were  used  with  a  =  .001,  .01,  .1,  1,  10  and 
B  =  .5,  1,  2,  4.  Since  a  number  of  the  combinations  yielded  nearly  the 
same  results,  we  report  only  the  following  (a, 8)  combinations:  (1,1), 
(10,1),  (1,2),  and  (1,4). 

Contamination  and  censoring  are  two  factors  which  can  significantly 
affect  the  estimates.  One  form  of  contamination  was  simulated  by  randomly 
taking  8%  or  2  out  of  25  of  the  observations  and  transforming  them  by  a  + 
/6x.  Values  of  (a,b)  were  (1,1),  (.5,2)  and  (0,4).  Only  a  =  .5,  b  =  2  is 
reported  in  Table  3.2  since  the  other  values  yielded  similar  results  and 
this  set  of  values  gives  the  most  contrast  between  the  methods.  Censoring 
was  simulated  by  keeping  a  percent  of  the  smallest  observations.  We  used 
40%,  60%,  and  80%  as  percentages  of  observations  kept. 

The  robustness  of  the  estimation  methods  was  also  tested  by  using  the 

Weibull  model  on  data  generated  from  another  distribution.  We  used  the 

2  2 

normal  distribution  with  u  =  3.5,  a  =  1  and  y  =  6,  o  =1,  the  log-normal 
2 

with  u  =  0,  o  =1,  and  the  "log-Cauchy"  distribution.  The  log-Cauchy 
distribution  was  obtained  by  generating  a  Cauchy  observation  x  and  trans¬ 
forming  it  by  ex.  The  location  and  scale  parameters  were  0  and  .1,  respec¬ 
tively. 
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Error  evaluations  were  recorded  for  all  of  the  different  cases.  The 
root  mean  square  (RMS)  was  calculated  with  respect  to  the  true  underlying 
distribution,  and  the  RMS  of  the  parameters  was  also  calculated.  The 
formulas  used  are  given  by 


RMS  =  (  Z  "  ( F ( x .  . )  -  F(x..;a  B.))2/Nn)1/2 

j  =  l  i  =  1  1 J  1 J  J  J 


RMS  =  (  T.  (a  -  a  )2/N)1/2 
j  =  l  J 


RMS..  =  (  Z  (6  -  M2/N)1/2. 
b  j  =  l  J 


F(x..)  is  the  value  of  the  true  underlying  distribution  for  the  ith  obser- 

■  J 

vation  in  the  j  generated  sample.  The  sample  estimates  for  a.  and  B,  are 

J  J 

t  h 

u  and  B  using  the  ju  sample.  Tables  3.1  through  3.3  give  the  results  for 
the  simulated  problems. 


TABLE  3,1 

RMS,  RMS  ,  RMSU  For  the  Fit  to  the  Known  Weibull  Distribution 
a  d 

(All  combinations  of  (a, 8)  gave  essentially  the  same  results) 


LL 

WLS 

GLS 

NLR 

MM 

RMS 

.056 

.061 

.055 

.064 

.059 

RMS 

a 

.230 

.228 

.221 

.263 

.252 

RMSe 

.216 

.212 

.194 

.251 

.252 

V 
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TABLE  3.2 

RMS,  RMS  ,  RMS„  For  the  Fit  to  the  Known  Weitmll  Distribution  When  8% 
a  S 

of  the  Generated  Values  Have  Been  Replaced  by  a  Contaminated  Value 


LL 

WLS 

GLS 

NLR 

MM 

(a  = 

1.6=1) 

RMS 

.060 

.059 

.057 

.060 

.060 

RMS 

a 

.202 

.183 

.199 

.194 

.210 

rmsb 

.206 

.200 

.194 

.277 

.233 

(a  = 

10,  0  =  1) 

RMS 

.074 

.065 

.072 

.063 

.072 

RMS 

a 

3.81 

4.01 

4.39 

5.75 

3.68 

rms6 

.150 

.160 

.179 

.197 

.147 

(a  = 

1.6=2) 

RMS 

.072 

.062 

.069 

.061 

.070 

RMS 

a 

.228 

.180 

.208 

.187 

.228 

RMS6 

.348 

.369 

.369 

.402 

.359 

(a  = 

1.6=4) 

RMS 

.122 

.080 

.116 

.063 

.112 

RMS 

a 

.328 

.186 

.269 

.192 

.324 

RMS6 

1.16 

1.15 

1.50 

.78 

.87 

Note:  A  contaminated  observation  is  .5  +  2x,  where  x  is  the  true  observa¬ 
tion:  all  other  contaminations  gave  similar  results. 
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TABLE  3.3 

RMS  Values  For  the  Fit  to  the  Known  Weibull  Distribution  Where 
a  Percentage  of  the  Largest  Values  Have  Been  Removed  or  Censored 


Oistrib.  % 

Censored 

LL 

WLS 

GLS 

NLR 

MM 

60 

.123 

.107 

.100 

.105 

- 

Weibull 

40 

.089 

.080 

.079 

.081 

- 

a=l  ,B=4 

20 

.070 

.067 

.063 

.069 

- 

0 

.056 

.061 

.055 

.064 

.059 

60 

.104 

.092 

.097 

.091 

- 

Normal 

40 

.080 

.074 

.076 

.075 

- 

N(6,l) 

20 

.064 

.066 

.064 

.065 

- 

0 

.061 

.062 

.059 

.063 

.063 

60 

.120 

.099 

.111 

.113 

- 

Log  Normal 

40 

.091 

.078 

.082 

.080 

- 

u=0,o^=l 

20 

.072 

.071 

.070 

.069 

- 

0 

.071 

.067 

.067 

.067 

.073 

Log  Cauchy* 

0 

.165 

.118 

.132 

.075 

.153 

‘The  Log-Cauchy  was  formed  using  the  Cauchy  probability  density  function 

2  2 

for  x  given  by  s/(x  +  (x-m))  where  x  =  .1,  m  =  0.  If  x  has  a  Cauchy 
pdf,  then  the  Log-Cauchy  variable  is  exp(x). 

Table  3.1  gives  a  comparison  of  the  various  methods  under  ideal 
conditions.  That  is,  the  model  distribution  is  correct,  and  there  is  no 
contamination  or  censoring  of  the  data.  The  values  are  essentially  the 
same  for  all  (a,e)  combinations  with  the  exception  of  a  =  10,  e  =  1,  in 
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which  case  the  RMSa  is  approximately  20  times  larger  than  other  RMS,  RMSq, 
and  RMSg  values  but  the  pattern  is  the  same.  The  GLS  method  out-performs 
the  other  methods  with  respect  to  the  fit  of  the  CDF  and  the  estimation  of 
(a, 8).  It  is  interesting  to  note  that  the  LL  method  is  close  to  GLS.  This 
is  important  for  cases  in  which  computer  storage  space  is  limited  and  the 
sample  size  is  large  since  GLS  does  require  an  additional  nxn  matrix. 

Table  3.2  illustrates  the  effects  of  contamination.  WLS  and  NLR 
appear  to  be  very  stable  with  respect  to  the  fit  of  the  CDF.  The  LL ,  GLS, 
and  MM  tend  to  degrade  for  larger  a  or  6  values. 

Table  3.3  shows  that  GLS  does  slightly  better  than  WLS  and  NLR  when 
the  underlying  distribution  is  Weibull  and  censoring  is  present.  Also 
worth  noting  is  that  the  LL  method  appears  to  be  more  adversely  affected  by 
censoring  than  the  other  methods. 

Table  3.3  also  shows  the  robustness  properties  of  the  estimators.  It 
is  interesting  that  the  Weibull  can  approximate  a  normal  distribution  quite 
well  when  no  censoring  is  present  for  any  estimation  methods.  Another 
interesting  case  is  that  of  the  log-Cauchy  distribution,  Only  NLR  was  able 
to  give  satisfactory  results,  and  actually  the  fit  is  surprisingly  good 
considering  the  underlying  attribution. 

4.  SUMMARY  AND  CONCLUSIONS 

We  have  examined  the  small  sample  properties  of  five  methods  of 
parameter  estimation  for  the  two  parameter  Weibull  distribution  using 
simulation.  It  is  clear  that  the  GLS  is  the  best  estimator  and  LL  is  a 
close  second  best  with  respect  to  the  fit  of  the  CDF  and  the  RMS  of  the 
parameters  under  ideal  conditions.  When  contamination  is  present,  WLS  and 
NLR  are  very  stable  whereas  the  other  methods  give  much  poorer  results 
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especially  when  u  or  3  are  larger  than  2.  NLR  also  gives  much  better 
results  than  the  other  methods  when  the  underlying  distribution  is  the 
log-Cauchy  distribution.  WLS,  GLS,  and  NLR  perform  well  when  the  underly¬ 
ing  distribution  is  the  Log-Normal  distribution. 

Computationally  LL  and  MM  are  certainly  the  least  expensive.  They  are 
easy  to  use  even  on  a  hand  calculator,  and  the  results  they  give  are  not 
far  from  the  GLS  under  ideal  conditions.  However,  LL  and  MM  do  not  offer 
the  robustness  of  the  other  three  methods.  The  robustness  of  the  WLS,  GLS, 
and  NLR  costs  in  terms  of  complexity  and  storage.  NLR  requires  a  fairly 
complex  program  or  a  software  package  to  implement.  GLS  requires  an 
additional  nxn  matrix  which  can  be  prohibitive  for  large  sample  sizes. 

This  problem  can  probably  be  overcome  bv  grouping  data  if  storage  is  a 
problem.  WLS  requires  only  an  additional  n  weights  to  the  LL  method. 

The  WLS  method  appears  to  be  the  most  cost  effective  method  of  the 
five  examined  in  this  study.  The  NLR  method  appears  to  be  the  most  robust. 

If  we  consider  the  log-Cauchy  to  be  an  extreme  case  of  a  wrong  model  using 
the  Weibull,  then  we  may  conclude  that  NLR  is  an  extremely  robust 
procedure. 
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